A SIMULATION APPROACH TO OPTIMAL STOPPING UNDER PARTIAL 

INFORMATION 

MIKE LUDKOVSKI 

Abstract. We study the numerical solution of nonlinear partially observed optimal stopping prob- 
lems. The system state is taken to be a multi-dimensional diffusion and drives the drift of the 
observation process, which is another multi-dimensional diffusion with correlated noise. Such models 
where the controller is not fully aware of her environment are of interest in applied probability and 
financial mathematics. We propose a new approximate numerical algorithm based on the particle 
filtering and regression Monte Carlo methods. The algorithm maintains a continuous state-space 
and yields an integrated approach to the filtering and control sub-problems. Our approach is entirely 
simulation-based and therefore allows for a robust implementation with respect to model specification. 
We carry out the error analysis of our scheme and illustrate with several computational examples. 
An extension to discretely observed stochastic volatility models is also considered. 



1. Introduction 

Let (rj,^, {J^t),^) be a filtered probability space and consider a d-dimensional process X = (Xt) 
satisfying an Ito stochastic differential equation (SDE) of the form 

(1) dXt = h{Xt) dt + aiXt) dUt + aiXt) dWt, 

where U and W are two independent (^4)-adapted Wiener processes of dimension du and dw re- 
spectively. Let y be a dy = djy-dimensional diffusion given by 

(2) dYt = h{Xt) dt + dUf 

Assumptions about the coefficients of (l)-(2) will be given later. Denote by TY = cr(Ys: < s < t) 
the filtration generated by Y . We study the partially observed finite horizon optimal stopping 
problem 

(3) sup ¥.[g{T,Xr,Yr)\, 

T<T, .T^'*' -adapted 

where 5 : [0, T] x M'^ x M.^^ ^ M is the reward functional. 

The probabilistic interpretation of (3) is as follows. A controller wishes to maximize expected 
reward g{t, x, y) by selecting an optimal stopping time r. Unfortunately, she only has access to the 
observation process Y] the state X is not revealed and can be only partially inferred through its 
impact on the drift of Y . Thus, r must be based on the information contained solely in Y . Recall 
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that even when Y is observed continuously, its drift is never known with certainty; in contrast the 
instantaneous volatihty of Y can be obtained from the corresponding quadratic variation. 

Such partiahy observed problems arise frequently in financial mathematics and applied probability 
where the agent is not fully aware of her environment, see Section 1.1 below. One of their interesting 
features is the interaction between learning and optimization. Namely, the observation process Y 
plays a dual role as a source of information about the system state X, and as a reward ingredient. 
Consequently, the agent has to consider the trade-off between further monitoring of Y in order to 
obtain a more accurate inference of X, vis-a-vis stopping early in case the state of the world is 
unfavorable. This tension between exploration and maximization is even more accentuated when 
time-discounting is present. Compared to the fully observed setting, we therefore expect that partial 
information would postpone decisions due to the demand for learning. 

In the given form the problem (3) is non-standard because the payoff g{t, Xt, Yt) is not adapted to 
the observed filtration {J-^) and, moreover, Y is not Markovian with respect to {J'Y)- This difficulty 
is resolved by a two-step inference/optimization approach. Namely, the first filtering step transforms 
(3) into an equivalent fully-observed formulation using the Markov conditional distribution vrj of Xt 
given J^J . In the second step, the resulting standard optimal stopping problem with the Markovian 
state (7r4,lt) is solved. 

Each of the two sub-problems above are covered by an extensive literature. The filtering problem 
with diffusion observations was first studied by Kalman and Bucy [21] and we refer to the excellent 
texts [2, 20] for the general theory of nonlinear stochastic filtering. The original linear model of [21] 
had a key advantage in the availability of sufficient statistics and subsequent closed-form filtering 
formulas for vr^. Other special cases where the filter was explicitly computable were obtained by [1] 
and [4]. However, in the general setup of (l)-(2), the conditional distribution vTf of Xt is measure- 
valued, i.e. an infinite-dimensional object. This precludes consideration of explicit solutions and 
poses severe computational challenges. 

To address such nonlinear models, a variety of approximation tools have been proposed. First, 
one may linearize the system (l)-(2) by applying (A) the extended Kalman filter [18, 23]. Thus, the 
conditional distribution of X is summarized by its conditional mean mt = E[Xt|.7-"^^] and conditional 
variance Pt = IE[(^t — mt)'^\J-Y]- One then derives (approximate) evolution equations for {mt,Pt) 
given observations Y. More generally, vr^ can be parameterized by a given family of probability 
densities, yielding the (B) projection filter. Let us especially single out the exponential projection 
methods studied by Brigo et al. [4, 5]. Third, the state space of nt can be discretized through (C) 
optimal quantization methods [35, 36]. This replaces (ttj) by a non-Markovian approximation (vrj) 
whose transition probabilities are pre-processed via Monte Carlo simulation. Fourth, one may apply 
(D) Wiener chaos expansion methods [28, 27, 32] that reduce computation of irt to a solution of 
SDK's plus ordinary differential equation systems. Finally, (E) interacting particle systems have 
been considered to approximate vr^ non-parametrically via simulation tools [7, 8, 9, 12]. 

The optimal stopping sub-problem of the second step can again be tackled within several frame- 
works. When the transition density of the state variables is known, classical (a) dynamic programming 
computations are possible, see e.g. [38]. If the problem state is low-dimensional and Markov, one 
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may alternatively use the quasi-variational formulation to obtain a free-boundary partial differential 
equation (pde) and then implement a (b) numerical pde solver for an efficient solution. Thirdly, 
(c) simulation-based methods [13, 26, 40] that rely on probabilistic Snell envelope techniques can be 
applied. 

The joint problem of optimal stopping with partial observations was treated in [16], [17], [29], 
[36] and [33]. All these models can be viewed as a combination of the listed approaches to the 
two filtering/optimization sub-problems. For example, [29] proposes to use the assumed density 
filter for the filtering step, followed by a pde solver for the optimization. This can be summarized as 
algorithm (B) / (b) in our notation. Meanwhile, [36] use (C) / (a) , i.e. optimal quantization for the filter 
and then dynamic programming to find optimal stopping times. Methodologically, two ideas have 
been studied. First, using filtering techniques (A) or (B), one may replace vrj by a low-dimensional 
Markovian approximation tt^. Depending on the complexity of the model, algorithms (a) or (b) can 
then be applied in the second step. Unfortunately, the resulting filtering equations are inconsistent 
with the true dynamics of vr^, and require a lot of computations to derive them for each considered 
model. The other alternative is to use the quantization technique (C) which is robust and produces 
a consistent (but non-Markovian) approximation ift. Since the state space of ift is fully discretized, 
the resulting optimal stopping problem can be solved exactly using dynamic programming algorithm 
(a). Moreover, tight error bounds are available. The shortcomings of this approach are the need to 
discretize the state space of X and the requirement of offline pre-processing to compute the transition 
density of vft. 

In this paper we propose a new approach of type (E) / (c) that uses a particle filter for the inference 
step and a simulation tool in the optimization step. Our method is attractive based on three accounts. 
Firstly, being entirely simulation-based it can be generically applied to a wide variety of models, with 
only minor modifications. In particular, the implementation is robust and requires only the ability to 
simulate {Xt, Yt). For comparison, free boundary pde solvers of type (b) often use advanced numerical 
techniques for stability and accuracy purposes and must be re-programmed for each class of models. 
Also, in contrast to optimal quantization, no pre-processing is needed. Moreover, the interacting 
particle system approach to filtering is also robust with respect to different observation schemes. In 
the original system (l)-(2) it is assumed that Y is observed continuously. It is straightforward to 
switch our algorithm to discrete regularly-spaced observations of Y that may be more natural in 
some contexts. 

Secondly, our approach maintains a continuous state space throughout all computations. In par- 
ticular, the computed optimal stopping rule r* is continuous, eliminating that source of error and 
leading to a more natural decision criteria for the controller. Thus, compared to optimal quanti- 
zation, our approach is expected to produce more "smooth" optimal stopping boundaries. Third, 
our method allows the user to utilize her domain knowledge during the optimization step. In most 
practical applications, the user already has a guess regarding an optimal stopping rule and the nu- 
merical computations are used as a refinement and precision tool. However, most optimal stopping 
algorithms rely on a "brute force" scheme to obtain an optimal stopping rule. By permitting custom 
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input for the optimization step, our scheme should heuristically lead to reduced computational efforts 
and increased accuracy. 

Finally, maintaining the simulation paradigm throughout the solution allows us to integrate the 
filtering and Snell envelope computations. In particular, by carrying along a high-dimensional ap- 
proximation of TTf, the initial filtering errors can be minimized in a flexible and anticipative way 
with respect to the subsequent optimization step. Thus, the introduction of filtering errors is de- 
layed for as long as possible. This is important for optimal stopping where the forward-propagated 
errors (such as the filtering error) strongly affect the subsequent backward recursion solution for r*. 
To summarize, our scheme should be viewed as an even more flexible alternative for the optimal 
quantization method of [36]. 

Remark 1. To our knowledge the idea of integrated stochastic filtering and optimization was con- 
ceived in [34], in the context of utility maximization with partially observed state variables. Muller 
et al. [34] proposed to use the Markov Chain Monte Carlo (MCMC) methods and an auxiliary ran- 
domized pseudo-control variable to do both steps at once. These ideas were then further analyzed 
in [3, 41] for a portfolio optimization problem with unobserved drift parameter and unobserved sto- 
chastic volatility, respectively. In fact, Viens et al. [41] utilized a particle filter but then relied on 
discretizing the control and observation processes to obtain a finite-dimensional problem with discrete 
scenarios. While of the same flavor, this approach must be modified for optimal stopping problems 
like (3), as the control variable r is infinite-dimensional. Indeed, stopping rules r are in one-to-one 
correspondence with stopping regions, i.e. subsets of the space-time state space. Such objects do not 
admit easy discretization. Moreover, the explicit presence of time-dimension as part of our control 
makes MCMC simulation difficult. Thus, we maintain the probabilistic backward recursion solution 
method instead. 

The rest of the paper is organized as follows. In Section 2 we recall the general filtering paradigm 
for our model and the Snell envelope formulation of the optimal stopping problem (3). Section 3 
describes in detail the new algorithm, including the variance-minimizing branching particle filter in 
Section 3.1, and the regression Monte Carlo approach to compute the Snell envelope in Section 3.2. 
We devote Section 4 to the error analysis of our scheme and to the proof of the overall convergence of 
the algorithm. Section 5 then illustrates our scheme on a numerical example; a further computational 
example is provided in Section 6 which discusses the extension of our method to discretely observed 
stochastic volatility models. Finally, Section 7 concludes. 

Before proceeding, we now give a small list of applications of the model (l)-(3). 

1.1. Applications. 

Optimal Investment under Partial Information. The following investment timing problem arises in 
the theory of real options. A manager is planning to launch a new project, whose value (Yf) evolves 
according to 

dYt = Xtdt + ay dUt, 
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where the drift parameter {Xt) is unobserved and {Ut) is an M-valued Wiener process. The en- 
vironment variable Xt represents the current economic conditions; thus when economy is booming, 
potential project value grows quickly, whereas it may be declining during a recession. At launch time 
r the received profit g{-) is a function of current project value Yr, as well as extra uncertainty that 
depends on the environment state. For instance, consider g{-) = Yr ■ (ao + aiX^- + boe), e ~ AA(0, 1) 
independent, where the second term models the profit multiplier based on economy state. Con- 
ditioning on the realization of e, expected profit is g(T, Xr,Yr) = Yr{ao + aiXr). Such a model 
with continuous-time observations was considered by [11] in the static case where Xq G {0, 1} and 
dXf = 0. A similar problem was studied in [31] with an additional consumption control. 

Using the methods below, we can treat this problem for general X-dynamics of the type (1), under 
both continuous and discrete observations. 

Stochastic Convenience Yield Models. Compared to holding of financial futures, physical ownership of 
commodities entails additional benefits and costs. Accordingly, the rate of return on the commodity 
spot contract will be different from the risk-free rate. The stochastic convenience yield models [6, 37] 
postulate that the drift of the asset price (1^) under the pricing measure F is itself a stochastic 
process, 

dYt = Yt{Xtdt + aY dUt), 
dXt = h{Xt) dt + a{Xt) dUt + (Tx{Xt) dWt. 
One may now consider the pricing of American Put options on asset Y with maturity T and strike 

supE[e-^^(E:-yO+], 

where the convenience yield X is unobserved and must be dynamically inferred. Beyond using Y to 
learn about X^, it is also possible to filter other observables, e.g. futures contracts, see [6]. 

Reliability Models with Continuous Review. Quality control models in industrial engineering [19] can 
also be viewed as examples of (3). Let Xt represent the current quality of the manufacturing process. 
This quality fluctuates due to machinery state and also external disturbances, such as current work- 
force effort, random shocks, etc. When quality is high, the revenue stream Y is increasing; conversely 
poor quality may decrease revenues. Because revenues are also subject to random disturbances, cur- 
rent quality is never observed directly. In this context, it is asked to find an optimal time r to replace 
the machinery (at cost g{Xt)) and reset the quality process (Xt). Assuming "white noise" shocks to 
the system and continuous monitoring of revenue stream this leads again to (l)-(2)-(3). The case 
where Y is discretely observed and X is a finite-state Markov chain was treated by Jensen and Hsu 
[19]. 

2. Optimization Problem 

2.1. Notation. We will use the following notation throughout the paper: 

• For X G M, we write x = [x\ + {x} to denote the smallest integer small than x and the 
fractional part of x, respectively. 
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• 5x denotes the Dirac measure at point x. 

• denotes the space of ah real-valued, bounded, continuous functions with bounded contin- 
uous derivatives up to order k on M'^. We endow C^(M'^) with the following norm 

||/lk,oo = Yl / e ^bi^''), rn<k, 

where a = (ai, . . . , Ud) is a multi-index and derivatives are written as Daf = d"^ ■ ■ ■ d'^'^ f . 

• = {/• -Oq/ ^ L^iW^), \a\ < k} denotes the Sobolev space of functions with p-integrable 
derivatives up to order k. 

• 'P{R'^) is the space of all probability measures over the Borel cr-algebra B{W^). For /u G 

= f^d f{x)^{dx). We endow V with the weak topology; ^„ — > /i weakly if V/ G C^, 
/^n(/) Kf)- 



2.2. Filtering Model. In this section we briefly review the theory of nonlinear filtering as applied 
to problem (3). We follow [7] in our presentation. 

Before we begin, we make the following technical assumption regarding the coefficients in (1) and 
(2). 

Assumption 1. The coefficients of (1) satisfy: b{x) G C^{W^),a{x) G C^{W^'"^Y),a{x) G Cf{R'^'"^^) 
and moreover, a and a are strictly positive-definite matrices of size d x dy and d x dw respectively. 
Similarly, in (2), h{x) G C^{M.'^). 

This assumption in particular guarantees the existence of a unique strong solution to (1), (2). We 
also assume that 

Assumption 2. The payoff function g is bounded and twice jointly continuously differentiahle g G 
C^2([o,T] xW^x w^y). 

The latter condition is often violated in practice where payoffs can be unbounded. However, one 
may always truncate g at some high level N without violating the applicability of the model. 

We begin by considering the conditional distribution of X given . Namely, for / G C'^{W^) 
define 

(4) ^tf ^nf{Xt)\j^l]- 

It is well-known [2] that TTf/ is a Markov, .F^-adapted process that solves the Kushner-Stratonovich 
equation 

dy 

(5) d{^tf) = TTtiAf) dt + Y, [^t{hk ■ f) - TTtihk) ■ Mf) + MB'f)] [dY^'^ - TTtihk) dt], 

k=l 
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where the action of the differential operators A and on a test function / G C^(W^) is defined by 



(6) 



^ d / dij dw \ d 

^fix) - '^Oiikix)ajkix) + '^crik{x)ajkix) didjf{x) +^bi{x)dif{x), 



i \k=l 



k=l 



i=l 



B^f(x)^Y.'^ik{x)dif[x). 



i=l 



Thus, Tit is a probabihty measure-valued process solving the stochastic partial differential equation 
(spde) corresponding to the adjoint of (5). To avoid the nonlinearities in (5), a simpler linear version 
is obtained by utilizing the reference probability measure device. Define a P-equivalent probability 
measure P by 

/ du ,.T du rp \ 

Ct ^ exp ( - ^ hkiXs) dU^ - 2 E ^'(^^) J ■ 



dF 
dF 



(7) 

(liT :/-rr \ , , , _ 

k=l-''' k=l 

From the Girsanov change of measure theorem (recall that h is bounded so that ^[Ct] = 1), it follows 
that under P the observation y is a Brownian motion and the signal X satisfies 

(8) dXt = {h{Xt) - ah{Xt)) dt + a{Xt) dYt + a{Xt) dWf 
We now set 

(9) ptf ^t[f{Xt)C^\j^j] , 

with (t defined in (7). Then by Bayes formula, ttj/ = ^ and moreover, ptf solves the linear 
stochastic partial differential equation 

dy 

(10) 

fe=l 

with A,B'' from (6). The measure-valued Markov process pt is called the unnormalized conditional 
distribution of X and will play a major role in the subsequent analysis. Under the given smoothness 
assumptions, it is known [2] that vrt (and pt) will possess a smooth density in Wp for all ]5 > 1 and 
t > 0. 

Returning to our optimal stopping problem (3), let us define the value function V by 
V{t, i, y- T) ^ sup E [g{T, X^, Yr) \Xt^tYt = y]. 

T<T 

Economically, V denotes the optimal reward that can be obtained on the horizon [t, T] starting with 
initial condition Yt = y and Xt ~ ^. Using conditional expectations we may write, 

V{t,^,y) = sup E''^'y[Tr^g{T,;Yr)] 

t<T<T 



d{ptf) = pt{Af) dt + Y, [ptihkf) + PtiB'^f)] dY^ 



sup &'y^^[pr9iT,;Yr)] 

t<T<T 



(11) 



sup &'^'y[G{T,pr,Yr)], where G{t,^,y) ^ [ g(t,x,y)i{dx), 

t<T<T jRd 
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and where E*'^'^ denotes IP-expectation conditional oiiYt = y,pt = ^. 

Equation (11) achieved two key transformations. First, its right-hand-side is now a standard 
optimal stopping problem featuring the Markov hyperstate {pt,Yt). Secondly, (11) has separated the 
filtering and optimization steps by introducing the fully observed problem through the new state 
variable pt- However, this new formulation remains complex as pt is an infinite-dimensional object. 
With a slight abuse of notation, we will write V{t, pt,Yt) to denote the value function as a function 
of the current unnormalized distribution pf. 

As can be seen from the last two lines of (11), one may solve (3) either under the original physical 
measure P using TTt, or equivalently under the reference measure P using pt. In our approach we will 
work with the latter formulation due to the simpler dynamics of pt and more importantly due to 
the fact that under P one can separate the evolution of Y and X. In particular, under P, y is a 
Brownian motion and can be simulated entirely on its own. In contrast, under P, the evolutions of 
TTt and Y are intrinsically tied together due to the joint (and unobserved) noise source (Ut). 

2.3. Snell Envelope. Let us briefly summarize the Snell envelope theory of optimal stopping in our 
setting. All our results are stated under the P reference measure, following the formulation in (11). 
For any .F^-stopping time cr, define 

(12) Z{a)= sup E[G{T,pr,Yr)\T^]. 

a<T<T 

Proposition 1 ([29]). The set form a supermartingale family, i.e. there exists a continuous 
process Z, such that Z{(j) = Z^, Z stopped at time cr. Moreover, an optimal time r for (11) exists 
and is given by t = inf{t: Zt = G{t, pt,Yt)} . 

The above process Z is called the Snell envelope of the optimal stopping problem (11). The 
proposition implies that to solve (11) it suffices to compute the Snell envelope Z. We denote by 
t < Tf < T the optimal r achieving the supremum in Zt = K[G{Tt , p-^* ,Yt-*)\J-Y]- By virtue of the 
(strong) Markov property of (pt, Yf) and the fact that pt is a sufficient statistic for the distribution of 
Xt\J-'^ it follows that V{t, pt, Yt) = su'Pt<r<T IE[G(r, pr, Yt)\J-'^] = Zt and (3) is equivalent to finding 
Tq above. Mazziotto [29] also gave a formal proof of the equivalence of the Snell envelopes under P 
and P that we discussed in the end of the previous section. 

To make computational progress in computing Tq, it will be eventually necessary to discretize 
time. Thus, we restrict possible stopping times to lie in the set = {0, At, 2At, . . . , T}, and label 
the corresponding value function (of the so-called Bermudan problem) as 

V^{t,C,y) = sup{E^'^'y[G{T,pr,Yr)] : T is ^'^-valued , J^^-adapted}. 

In this discrete version, since one either stops at t or waits till t + At, the dynamic programming 
principle implies that the Snell envelope satisfies 

(13) V^{t,pt,Yt)=ma^(G{t,pt,Yt),E[V'^{t + At,pt+At,Yt+At)\:F^]). 
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2.4. Continuation Values and Cashflow Functions. For notational convenience, we now write 
Zt = it,pt,Yt) and Gt = G{Zt). Let 



q, = qt{Zt)^nV'^{Zt+Mm]. 

denote the continuation value. Then the Snell envelope property (13) implies that qt satisfies the 
recursive equation 



(14) qt=^ [max(Gf+At, qt+At)\Tl] 
The optimal stopping time also satisfies a recursion, namely 

(15) = Tl^M'^{qt>Gt} + *l{gt<Gt}- 

In other words, when the continuation value is bigger than the immediate expected reward, it is 
optimal to wait; otherwise it is optimal to stop. Equation (15) also highlights the fact that the 
continuation value qt serves as a threshold in making the stopping decision. Associated with a 
stopping rule r* defined above is the future cashflow function. Denote Bt{q) = ^{qt<Gt} its 
complement by Bf{q) = 1 — Bt{q), and starting from the timepoint t, define the expected future 
cashflow as 



(16) = T.G{Zs)1b^^.j,.^^^,„.b, 



s=t 



'dt{q) is a path function whose value depends on the realization of {Zt) between t and T, as well as 
the threshold function q. Note that (16) can be defined for any threshold rule by simply using 
Bt{q') instead. In discrete time using the fact that is an .F^-stopping time and (15) we get 

gi(Zt) = E[y'^(Zi+At)|^r] = E[G(r;+^i,p.*_^^^,y.;^^j|.F,^] 



(17) =t 



E G(.,p„y.)i^..^^^=.} 



.s=t+At 



It follows that knowing 'i?(9), one can back-out the continuation values q and then recover the value 
function itself from V^{Zt) = max{G{Zt),q{Zt)). In particular, for t = 0, we obtain V^{0, £,o,yo) = 
max(G(Zo), go('^o))- The approximation algorithm will compute q and the associated by repeatedly 
evaluating the conditional expectation in (17) and updating (16). The advantage in using cashflows 
??(g) rather than q itself is that an error in computing q is not propagated backwards unless it leads 
to a wrong stopping decision for (15). As a result, the numerical scheme is more stable. 

Remark 2. EglofF [13] discusses a slightly more general situation, where the look-ahead cashflows "i? 
are taken not on the full horizon [t, T] but only some number w of steps ahead. This then produces 

t+wAt 

(18) '&t,w{q){'Z') = ^ G{Zs) ■ Ib^B^_^^_^...-Bs + It+wAtiZt+wAt) ■ lB?Bf+At--'Br+„At' 

s=t+At 
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and one still has qt{Zt) = K['&t+i,w{q){'^)\^t] w = 0, . . . ,T — t — 1. In particular, the case 

= is the Tsitsiklis-van Roy [40] algorithm, 

(19) A,o{q) = G{Zt)lB, + qt{Zt)lBi = max{Gt, qt). 

To compute (17), the corresponding conditional expectation will be approximated by a finite- 
dimensional projection Ti. Indeed, by definition of conditional expectation with respect to the 
Markov state {pt,Yt), we have qt{Zt) = K[^t-^-At{q){'^)\^t"] — F'{pt,Yt) for some function F. Let 
{Bj)J^Y be a (Schauder) basis for the Banach space M+ x V{W^). Then as r ^ oo, F (and qt) can 
be approximated arbitrarily well by the truncated sum 

r 

(20) qt{Zt) ^ qtiZt) ^ ^jBjiPu Yt) = pr^ ot[9t+Mm\J'Ii 

i=i 

where the projection manifold (or architecture) \s7i = span(Bj{S,,y),j = 1, . . . ,r). As long as (20) 
does not modify much the resulting stopping sets Bt{q), one expects that the resulting cashflow 
function i?(g) will be close to the true one "!?((?). In our filtering context, the extra modification is 
that Zt must itself be approximated by a finite-dimensional filter Z". However, if the approximation 
is high-dimensional, then it should have very little effect on the projection step of the Snell envelope 
in (20). 

2.5. Analytic Approach. The analytic approach to optimal stopping theory characterizes the value 
function V{t, ^, y) in terms of a parabolic-type free boundary problem. This is in direct counterpart 
to standard optimal stopping problems for diffusion models. 

The major difficulty is the infinite-dimensional nature of the state variable tt. Limited results exist 
for the corresponding optimal stopping problems on Polish spaces, see e.g. [30, 29]. In particular, 
[30] characterize V as the minimal excessive function dominating G in terms of the (Feller) transition 
semigroups of (vrj,!^). A more direct theory is available when TVt £ H belongs to a Hilbert space; 
this will be the case if (and therefore nt for all t) admits a smooth L'^-density. Even then, 
since the smoothness properties of V with respect to ^ are unknown, one must work with viscosity 
solutions to second-order pdes as is common in general stochastic control. The following proposition 
is analogous to Theorem 2.2 in [16]. Denote by D the Frechet derivative operator and for a twice 
Frechet differentiable test function (j)(t, ^, y) let 

(21) C4> = ^ti {{aa^ + aa^)Dl^^) + {b, D^cf)) + hdycf) + aZ)^./. • dycj), 

with (•, •) denoting the inner product in H be the infinitesimal generator of the Markov process 

Proposition 2. The value function V{t,7r,y) is the unique viscosity solution of 

r vt + cv<o, 

(22) <^ 

\V{t,Tr,y)>G{t,Tr,y). 

Moreover, V is bounded and locally Lipschitz (with respect to the Hilbert norm). 
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In principle the infinite-dimensional free boundary problem (22) can be tackled by a variety of 
numerical methods including the projection approach that passes to a finite-dimensional subset of 
L2(M'^). 



3. New Algorithm 

In this section we describe a new numerical simulation algorithm to solve (11). This algorithm 
will be a combination of the minimal-variance branching particle filter algorithm for approximating 
TTt and pt, described in Section 3.1, and the regression Monte Carlo algorithm described in Section 
3.2. 

3.1. Particle Filtering. The main idea of particle filters is to approximate the measure- valued 
conditional distribution ttj by a discrete system of point masses that follows a mutation-selection 
algorithm to reproduce the dynamics of (10). In what follows we summarize the particular algorithm 
proposed in [7, 9, 8]. We assume that we are given (l)-(2) with continuous observation of (Yt). Fix 
n > 0; we shall approximate vrt by a particle system tt" of n particles. The interacting particle system 
consists of a collection of n weights a"(t) and corresponding locations v'j{t) G W^, j = 1, . . . ,n. We 
think of Vj' as describing the evolution of the n-th particle and of (i) G as its importance in 
the overall system. Begin by initializing the system by independently drawing Vj'(0) from the initial 
distribution Xq ~ and taking (0) = 1 Vj. Let 5 be a parameter indicating the frequency of 
mutations; the description below is for a generic time step t G [m5,{m + 1)5), assuming that we 
already have v^{m6) and a^{m6) = 1. 
First, for m5 < t < {m + 1)6 we have 

(23) v]{t) = v]{m6) + r (6 - ah){v'^{s)) ds + f a{v'^{s)) dYs + f a{vj{s)) dW^^\ 

where W^^^ are n independent P- Wiener processes. Thus, each particle location evolves independently 
according to the law of X under P. The unnormalized weights «"(•) are given by the stochastic 
exponentials 

(24) 



{t) = l + J2 [ a]{s)hk{vj{s)) dY^ = exp / hk{v^{s)) dY^ -\Y. I ^fcK(^))' 



Let 

«-((^ + 1)6-) ^ , g (o, i) 



, a^"((m + l)J-) 
a^j{{rn + 1)6- 



denote the normalized weights at the next mutation time. Then at t = (m ^1)6 each particle 
produces o"((m-|- 1)6) offspring inheriting the parent's location, with the branching carried out such 
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that 



(25) 



n 

J]o,"((m + l)<5) = n, 
I i=i 



[na^{{m + l)5-)\ with prob. 1 - {ncJ^{{m + 1)5-)], 
1 + [na"((m + l)5-)\ with prob. {naj{{m + 1)5-)}, 



where {x} denotes the fractional part of x G M. Note that the different o"'s are correlated so that 
the total number of particles always stays constant at n. One way to generate such 's is given in 
the Appendix of [7]. Following the mutation, particle weights are reset to a^((m + 1)5) = 1 and one 
proceeds with the next propagation step. 

With this construction we now set for 1715 < t < {m + 1)5, 



(26) 



m 1 ^ n 



i=i 



■ iy-jZa^{t)5.^it){-)^ 



Interpreted as a probability measure on W^, vr" (p") is an approximation to the true vrt (resp. pt) as 
indicated by the following 

Proposition 3 ([7], Theorem 5). There exist constants Ci{t),C2{t) such that for any f G C^(M'^), 



(27) 



E 



Ptf-Ptf,2] < Ci{t) 



Ptl 



n 



which in turn implies that (since is bounded) 



(28) 

withCi{t) = 0{e^-t). 



E 



< 



C2{t) 



n 



2 



2 

l,oo 1 



Similar results can be obtained under the assumption that Y is observed discretely every 5 time 
units. In that case one simply takes, 

a«((m + 1)5-) = exp hk{vj{m5)) ■ {Vf^^,^, - Y^,) - \ ^ h,{v]{m5))' ■ s] , 



\k=l 



k=l 



with the rest of the algorithm remaining unchanged. 

The use of discrete point masses in the interacting particle filter renders the analytical results based 
on Hilbert-space theory (e.g. (22)) inapplicable. This can be overcome by considering regularized 
particle filters [24], where point masses are replaced by smooth continuous distributions and the 
particle branching procedure switches back to a true re-sampling step. 
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3.2. Regression Monte Carlo. The main idea of our algorithm is to simulate N paths of the 
Z process (or rather the particle approximation {Z"")), yielding a sample (z^), k = 1,2, . . . , N, 
t = 0, At, . . . ,T. To simulate (zj), we first simulate the Brownian motion (1^) under P, and then 
re-compute along the simulated paths as described in the previous subsection. Using this sample 
and approximation architectures Tit of (20), we approximate the projection pr-^ through an empirical 
least-squares regression. Namely, an empirical continuation value is computed according to 

1 ^ 

(29) qt = argmin - ^ \f{zi) - mzl+At)\' ^ Wn oE[{}^^^,{q){^-)\^n, 

where 'd^ is the empirical cashflow function along simulated paths obtained using the future ^'s. One 
then updates pathwise "d^ and r using (16) and (15) respectively and proceeds recursively backwards 
in time. This is the same idea as the celebrated regression Monte Carlo algorithm of Longstaff and 
Schwartz [26] . The resulting error between q and the true q will be studied in Section 4 below. 

Many choices exist regarding the selection of basis functions Bj{pt, Yt) for the regression step. As 
a function of y, one may pick any basis for (M'^'^ , P) , e.g. the Laguerre polynomials. As a function 
of p, a natural probabilistic choice involves the moments of Xt\J-J , i.e. ai{ptx^). It is also known 
that using a basis function of the form EUR{z) = 'Kt[G{ZT)] (the conditional expectation of the 
terminal reward or the "European" counterpart,) is a good empirical choice. 

Remark 3. If one only uses the first two conditional moments of X, ptx and ptx^ inside the basis 
functions, then our algorithm can be seen as the non-Mar kovian analogue of applying the Extended 
Kalman filter for the partial observations of X and then computing the (pseudo)-Snell envelope of 
(3). In that sense, our approach generalizes all the previous filtering projection methods [5, 23] for 
(3). 

3.3. Overall Algorithm. For the reader's convenience, we now summarize the overall algorithm 
for solving (3). 

• Select model parameters N (number of paths); n (number of particles per path); At (time 
step for Snell envelope); 5 (time step for observations and particle mutation); Bi (regression 
basis functions); r (number of basis functions). 

• Simulate N paths of (y^) under P (which is a Brownian motion) with fixed initial condition 

• Given the path {yt), use the particle filter algorithm (23)-(24)-(26) to compute p^'^ along 
that path, starting with p^'^ ~ ^o- 

• Initialize q^{T) = ^^'^{q) = G{z^), t^{T) = T, k = 1, . . . , N. 

• Repeat for t = {M - 1) At, . . . , At, 0: 

— Evaluate the basis functions Bi{z^), for i = 1, . . . ,r and k = 1, . . . ,N . 

— Regress 

af 4 argmin J^k?;^^^,(g)-J^a^i?,(zf) . 

"SK" k=l i=l 

— For each k = \, . . . ,N the following steps: Set g'^(t) = ^[^^ 

ort'B..{z\). 



14 



MIKE LUDKOVSKI 



- Compute G{z^) 

— Update '&^'^{q) 



- Update r'=(t) 




r''(t + At) otherwise. 



• End loop; 




total memory requirements are 0{N ■ M ■ r). In terms of number of operations the overall algorithm 
complexity is 0{M ■ N ■ (n? + r^)), with the most intensive steps being the resampling of the filter 
particles and the regression step against the r basis functions. 



This section is devoted to the error analysis of the algorithm proposed in Section 3.3. Looking 
back, our numerical scheme involves three main errors. These are: 

• Error in computing pt which arises from using a finite number of particles and the resampling 
error of the particle filter 

• Error in projecting the cashflow function onto the span of basis functions TC and the 
subsequent wrong stopping decisions; 

• Error in computing projection coefficients a* due to the use of finite-sample least-squares 



We note that the filtering error is propagated forward, while the projection and empirical errors 
are propagated backwards. In that sense, the filtering error is more severe and should be controlled 
tightly. The projection error is the most difficult to deal with since we only have crude estimates 
on the dependence of the value function on pt. Consequently, the provable error estimates are very 
pessimistic. Heuristic considerations would imply that this error is in fact likely to be small. Indeed, 
the approximate decision rule will be excellent as long as ^{{qt > Gt} H {qt < Gt}) is small, since the 
given event is the only way that the optimal cashflows are computed incorrectly. By applying domain 
knowledge the above probability can be controlled through customizing the projection architecture 
Tit- For instance, as mentioned above, using EUR{z) as one of the basis functions is often useful. 

The sample regression error is compounded due to the fact that we do not use the true basis 
functions but rather approximations based on Z^. This implies the presence of error-in-variable 
during the regression step from the pathwise filtering errors. It is well-known (see e.g. [15]) that this 
leads to attenuation in the computed regression result, i.e. |a^'*| < |a*|. An extensive statistical 
literature treats error reduction methods to counteract this effect, a topic that we leave to future 
research. 

As a notational shorthand, in the remainder of this section we write to denote expectations 
(as a function on M*^ x M'^'*') conditional on It = y and pt = ^. We recall that the optimal cashflows 



4. Error Analysis 



regression. 
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satisfy 

while the approximate cashflows are 

gi = pr^oKt[^i^^,(g)(Z")]. 

Note that inside the algorithm, qt is evaluated not at the true value Zt = {pt,Yt), but at the 
approximate point Z". To emphasize the process under consideration we denote by = q'^{Z'^) 
the continuation function resulting from working with the Z"-process. Observe that the difference 
between and the true q is solely due to the inaccurate recursive evaluation of the reward G (since 
Y is simulated exactly); thus if the original reward g in (3) is independent of X then = q. 

The error analysis will be undertaken in two steps. In the first step, we consider the mean-squared 
error between the continuation value qt based on the true filter pt and the continuation value qf 
based on the approximate filter p^. In the second step, we will study the difference between and 
the approximate qt above. Throughout this section, || • ||2 = ]E[| • p]-^/^. 



Lemma 1. There exists a constant C(T), such that for all t <T 
(30) M'&t+Atiql - ^t+At{q)] 



{T-t).C{T) 
2- At../^ 




Proof. Suppose without loss of generality that q^{Zl^) > q{Zt). Let r be an optimal stopping time 
for the problem represented by g". Clearly such r is sub-optimal for q; moreover since both Z and 
are .F^-adapted, r is admissible for q. Therefore, 

(g"(Zr) - qiZt)f < Et[G{Z:^) - G{Zr)f 

[{G{Z^)-G{Zs))-l{r=s}\^ 

^ E ^•Ei[|G(Z«)-G(Z,)|2], 

s=t+M 

where the last line is due to Jensen's inequality. Averaging over the realizations of {Z^^ Zt) we then 
obtain 

T 

t[\q-{Z^)-q{Zt)\^]< Z__^.E[|G(Zr)-G(Z,)|2] 

s=t+/\t 



T 

s=t+M 



, (r-t)c(T) „ 2 _ (T-t)2-c(r) „ 2 

^ > - IIOIIi „ — - - IIOIIi 



using Proposition 3. 

Note that this error explodes as At due to the fact that we do not have tight bounds for 
M\G{Z^) - G(Z,)|2l|,^,}]. In general, one expects that MGiZ"^) - G{Zs)\''l ~ ]Et[|G(Z,") - 
G(Zs)P] • P(r = s) which would eliminate the At"^ term on the last line above. 

□ 
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In the second step we study the L^-difference of the unnormahzed continuation values, H^f — ^t||2 = 
]E[(g"(Z") — qt{Zt))'^Y^'^ ■ This total error can be decomposed as 

Ut-qth < ||pr^[o]Ei[^,%,(g)(Z")] -pr^ o]Ei[^i+At(g)(Z") 



(31) 



Si 

+ ||pr^ otti^t+^tmz^] -M^t+Atm^n]\\2+\\M^t+^^^ 

V ■' V 



l2 ■ 



£2 £3 

The three error terms Si on the right-hand-side of (31) are respectively the empirical error £1, the 
projection error £2, and the recursive error from the next time step 1S3. Each of these terms is 
considered in turn in the next several lemmas with the final result summarized in Theorem 1. The 
first two lemmas have essentially appeared in [13] and the proofs below are provided for completeness. 

Lemma 2 ([13, Lemma 6.3]). Define the centered loss random variable 

(32) 4(g)(Z") = \qt - ^t+Atm' - I pr^ o]Ei[^?t+At(g)] - ^t+At(g)p. 
Then 

(33) £f = \\qt-pTnoM'&t+At{q)]g < miq)^]. 
Proof. First note that 

(34) ||gt-pr^o]Et[t?t+At(g)]||i + ||pr7^oEt[i9i+At(g)] -]Et[T?t+At(g)]||i < \\qt - M^t+AtmWl 



because qt G TCt belongs to the convex space Tit, while pry^o¥,t['&t+Atiq)] S Tit is the projection of 
^t+At{q)- Therefore the three respective vectors form an obtuse triangle in L^: 



E 



{qt - Wn oM^t+Atiq)]) ■ {pin oEti^t+Atiq)] " Eti^t+Atiq)]) 



< 0. 



Direct expansion using the tower property of conditional expectations and the fact that qt € J-t 
shows that E[{qt - Etl'&t+Atiq)]) ■ (Eti-dt+Atiq)] - ^t+At{q))] = 0, so that 



Y 



(35) E 
Similarly, 



\qt-Et[^t+Atm\' 



+ E 



\Et[^t+Atm - ^t+At{q)\^] = i: [\qt " ^t+At{q)\' 



E 



(pr^ oEt[^t+Atm - M^t+Atiq)]) ■ {Et[^t+Atm - ^t+At{q)) 



0, 



and so 

(36) E \\ pr„ oEt[{>t+At{q)] - M^t+At 



E 



pr„ oEt[^t+At{q)] - ^t+At{q)(' 



-E 



\Et[^t+Atm - ^t+Atm' 



Combining (34)-(35)-(36) we find 



qt - pr^ o Et[i9t+At{q)] 



< E 



\qt - ^t+Atm^ - \M^t+Atm - ^t+Atm' 



[\pin oEt[^t+Atm - ^t+Atm^ - \Et[^t+Atm - ^t+Am'}] = mm^i]- 



□ 
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The above lemma shows that the squared error £l resulting from the empirical regression used 
to obtain qt (which recall is a proxy for E,t['dt+At{Q)]) can be expressed as the difference between 
the expected actual difference \qt — '&t+At{Q)\'^ versus the theoretical best average error after the 
projection | pr^^ oEt['&t+At{q)] - '&t+At{q)\'^ ■ 

Lemma 3 (cf. [13, Proposition 6.1]). We have £2 < 2\\Et[^t+At{q'') -^t+At{q)]h+^nf Wf-qth- 
Proof. We re- write, 

£2 = oEt[i9t+Atm - M^t+Atm\\^ < \\p^n oMi3t+Atm - PT^ oEt[^t^At{q")] 

+ ||pr„ oEt[^t+At{qn] - M^t+At{ql]\\^ + \\Et[^t+At{ql - ^t+At(g)]||2 

< 2\\Et[i^t+At{ql - ^t+Atmh + \\f -Et[^t+At{ql]\\ 

]E4^?t+At(g") - ^t+Atm + inf 11/ - (?ril2, 

where the second inequality uses the contraction property of the projection map pr-;.^ and the defini- 
tion of projection onto the manifold Tit- □ 

Lemma 4. We have for any p > 1 



(37) 



Et[T9t+At(<z")-^t+At(g)] < Yl \%-i''s\ 



s=t+At 



Proof. To simplify notation we drop the function arguments and also write qt+i,Gt+i, etc., to 
mean qt+At, etc. in the proof below. By definition of the cashflow function, £3 := ||Et[T?j4_i(g"') — 
^m(^)]||p = 

Et[Gt+il{gn^^<Gt+i} + ^?t+2(9")l{gJVi>Gt+i} - Gt+ilqt+-,<Gt+i - ^?t+2(g)l{gt+i>Gt+i}] 
Et[Gf+i(l|G,+i>gj^^J - l{Gt+i>qt+i}) + ^t+2{q'^)l{gl^^^>Gt+i} - ^t+2{q)l{qt+i>Gt+i}] 

< Pt[^i]||p + ||]Ei[gr+i(i{G.+i>,r+j - i{G.+,>4.+a) + ^t+2(g")i{G,+,<gr+j -^*+2(^)i{G.+,<4,+i}] 

where 

Ai = (Gf+i - gJYl) • (l{Gt+i>gr+J ~ l{Gi+i>gt+i}) 

= (Gt+l - qt+i) (l{gt+i>Gt+i>gr+i} ~ l{9r+i>Gt+i>gt+i}) 

< {qt+1 - gt+l)l{gt+i>Gt+i>g,"+J - (qt+l - %'+l)l{gr+i>Gt+i>gt+i} 

< \qt+i - qt+i\- 

For the remaining terms, using the fact that = ^['^t+'2{q"')\'^t+i\ obtain 



<?r+l(l{Gt+i>gJVJ - l{Gt+i>gt+i}) = ^?t+2(9")(l{Gt+i>9^V J " '^{Gt+i>qt+i}) 
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and therefore 



< 




p 





- ^?t+2(g)l{Gt+i«jt+i} 



+ 1{G 
+ 



t+i>9r+i} " 
M\qt+i 



1 



{Gt+i>qt+l} 



q't+il 



< \\qt+l-q?+i\\p+\\M{^tMql-^t+2{m{G,+,<4t+.}]\ 
<\\qt+l-q?+l\\p+\Wt+2{qn-^t+2{ 

follows. 



„n I 
is UP 



By induction, S:^ < Yls=t+i U' 

Based on Lemmas 1-2-3-4, we obtain the main 
Theorem 1. We have 

(38) \\qt{Z^) - qt{Zt)\\2 < 4(^-*)/^* max | inf ||/ - q^h + JtUq)] \ + 



□ 



t<s<T IfeHs 
Proof. Combining Lemmas 2-3-4 we find that 



C{T){T-t) 



At- 



n 



l5l|l,oo- 



\qt 



qn2<\/mm + wf-qth+s 



T 

E 

s=t+At 



Therefore, iterating 

\\qt-qth < 



EUq)] + inf 

J&rLt 



q:\\2. 



gri|2 + 3-{ VE[^i+At(g)]+ inf 11/ 

/e/tt+At 



+ 9 



/fcrtt+2At 



ii/-'zr+2Aji2 +■•• 



< 4(^-*)/^* max 
t<s<T 



E[/,(g)]+ inf 11/ -g; 



Finally, we have ||gt — ^t||2 < H?* — '7rll2 + Ikf — Qtib, and applying Lemma 1 the result (38) follows. □ 

4.1. Convergence. To obtain convergence, one proceeds as follows. First, taking n — > oo eliminates 
the filtering error so that Z" ^ Z and the corresponding errors in evaluating G vanish. Next, 
one takes oo, reducing the empirical error and the respective centered loss term E[/t(q)]. 

Thirdly, one increases the number of basis functions r — > oo in order to eliminate the projection 
error infjg-?^^ 11/ ~ Q^H- Finally, taking At — > we remove the Snell envelope discretization error. 

The performed error analysis shows the major trade-off regarding the approximation architectures 
l-if On the one hand, Jit should be large in order to minimize the projection errors minyg-^^ H/^^Fll- 
On the other hand, 1-it should be small to control the empirical variance of the regression coefficients. 
With many basis functions, one requires a very large number of paths to ensure that q is close to 
q. Finally, Ht should be smooth in order to further bound the empirical regression errors and the 
filtering error-in- variable accumulated when computing the regression coefficients. 

In the original finite-dimensional study of [13], the size of Ht was described in terms of the Vapnik- 
Cervonenkis (VC) dimensions nyc and the corresponding covering numbers. Using this theory, [13] 
showed that overall convergence can be obtained for example by using the polynomial basis for Ht 
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and taking the number of paths as = r'^+'^^ where r is the number of basis functions, d is the 
dimension of the state variable and k is the smoothness of the payoff function g € Wp. In the infinite- 
dimensional setting of our model, the VC-dimension is meaningless and therefore such estimates do 
not apply. One could trivially treat as an n-dimensional object, but then the resulting bounds 
are absurdly poor. It appears difficult to state a useful result on the required relationship between 
the number of basis functions and the number of paths needed for convergence. 

Remark 4. A possible alternative is to apply the Tsitsiklis-van Roy algorithm [40], which directly 
approximates qt (rather than i)) using the recursion formula (19): qt = ]Ej[max(G(Zi+At)i Qi^t+At))]- 
Like in Section 3, the approximate algorithm would consist in computing via regression Monte Carlo 
the empirical continuation value 

qt = pr^ oEt[max(G(Z,'V^,), qiZ^^^,))]. 

In such a case, the error between qt and qt admits the simpler decomposition (using max(a, b) < a + b) 



(39) 



\qtiZ^) - qt{Zt 



< 



pr^ oEt[max(G(Z;VAt)> ^(^r+At)] " oEi[max(G(Z;VAt)' ^(^r+A*))] 
pr^ o]Ei[max(G(Z,'V^,), g(Z«_,^J)] - ]Ei[max(G(Z,'V^,), ^(Z^VaJ)] 



MG{Z^^^^)-G{Zt+At)] 
Et[?(^r+At) - <i{Zt+At 



+ 



m{z^^^t)-<i{z^^At)] 



We identify the first two terms as the empirical E-i and projection 82 errors (as in Lemmas 2 and 3), 
the third term as the G-evaluation error, the fourth term as the next-step recursive error, and finally 
the last term as the sensitivity error of q with respect to Z. Controlling the latter error requires 
understanding the properties of the continuation (or value) function in terms of current state. This 
seems difficult in our infinite-dimensional setting and is left to future work. Nevertheless, proceeding 
as in the previous subsection and iterating (39), we obtain for some constants C3, C4 



\qt - qth < 



g3 • (T - 1) 
At 



max 

t<s<T 



inf 11/ 



g,||2 + \/EUq)] + ^||5||i,oo + MZ^) - q{Zs)h 



n 



so that the total error is linear rather than exponential in number of steps T/At as in Theorem 1. 
Even though this theoretical result appears to be better, empirical evidence shows that the original 
algorithm is more stable thanks to its use of t?. 



5. Examples 

To illustrate the ideas of Section 3 and to benchmark the described algorithm, we consider a model 
where an explicit finite- dimensional solution is possible. Let 

r dXt = -nXt dt + axipdWt + v^l - p2 dUt); 
\ dYt = {Xt -a)dt + ay dWt, 



(40) 
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with ([/, W) being two standard independent one-dimensional Brownian motions. Thus, F is a 
linear diffusion with a stochastic, zero-mean-reverting Gaussian drift X. We study the finite horizon 
optimal stopping problem of the form 

(41) Vit, y) = sup E''^^y[e-'^giXr, Yr)] ^ sup E*'^'?' [e-'^(Yr{ci + Xr) - 02)+] , a^R, 

T<T T<T 

which can be viewed as an exotic Call option on Y, see the first example in Section 1.1. Note that 
the payoff is guaranteed to be non-negative even if the controller stops when Yt-{ci + X^-) < C2- In 
this example, under the reference measure P, we have 

' dYt = aydWf, 

^ dXt = [-nXt - piaxlcyy) ■ {Xt - a)] dt + pax dWt + ^/l - pVx dW^^ ; 

dpAx) = -(TxPt [x] + Kxpt{x) + Kpt{x) + [ — 2 J ^Yt, 

2 ap ay 

where W-^ is a P- Wiener process independent of W. 

Below we carry out a numerical study with parameter values taken as 



Parameter 


K a ay 




T r 


P 


Cl 


C2 


Value 


2 0.05 0.1 


0.3 


1 0.1 


0.6 


1 


2 



Since on average Xt is around < a, y tends to decrease, so that in (41) it is optimal to stop early. 
However, the drift process X is highly volatile and quite often Xt > a produces positive drift for 
Y, in which case one should wait. Consequently, the stopping region will be highly sensitive to the 
conditional distribution ttj. 



5.1. Kalman Filter Formulation. The model (40) also fits into the Kalman-Bucy [21] filter frame- 
work. Thus, if the initial distribution Xq ~ Af{'mo, Pq) is a Gaussian density, then Xt\J-^ ~ 
J\f{mt,Pt) is conditionally Gaussian, where 

{dmt = -KTUt dt + {pax + Pt/i^Y) dWt, 
dPt = {-2KPt + a\- {pax + Pt/cTY?) dt 

Note that the conditional variance Pt is deterministic and solves the Riccati ode on the second line 
of (42). In (42), Vl^ is a P-Brownian motion, the so-called innovation process. Moreover, as shown 
by [25, Section 12.1], = so that we may equivalently write 

dYt = {nit — a)dt + aydWt- 



dWt 



dYt — {m-t — o) dt 



ay 
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The pair {mt,Pt) are sufficient statistics for the conditional distribution of Xt\J-Y and the corre- 
sponding payoff can be computed as 



n9{XuYt)\TY]=¥. 



(y(ci +mt + 

i ^2 



C2)- 



^'/2{((ci+mt)y 



where X ~ A/'(0, 1) 



'2tt 



C2) + yy/px^ dx 
+ ((ci + mt)y - C2) • (1 - $(x*)) =: G{mu P, Yt 



t), 



where x* 



— — isi^hll and is the standard normal cumulative distribution function. Thus, 

the original problem is reduced to 



(43) 



V{t,m,p,y) 



sup E 

T<T 



e-'^G{mr,Pr,Yr] 



mo = m, Pq = p,Yo = y 



This two-dimensional problem (recall that {Pt) is deterministic) can be solved numerically using a 
pde solver applied to the corresponding version of the free boundary problem (22). Namely V of 
(43) is characterized by the quasi-variational inequality 



(44) 



max|Vt -|- (m 



+ (pcrxCTY + Pt)ymy 

V{T,m,p,y) = G{m,p,y). 



KmVm + ^{pCTX + Pt/Tyf'VTr, 

rV, G{m, p, y) - V{t, m, y) | = 



0, 



5.2. Numerical Results. To benchmark the proposed algorithm we proceed to compare two so- 
lutions of (41), namely (i) a simulation algorithm of Section 3.3 and (ii) a finite-differences pde 
solver of (44). The Monte Carlo implementation used = 30 000 paths with n = 500, S = 0.01, 
At = 0.05 or twenty time-steps. For basis functions we used the set {l,y,y'^, pfX, ptg, ptEUR}, 
where EUR{t,(,,y) = K^'^'y[e~^^'^~^^g{Xj',YT)] is the conditional expectation of terminal payoff. A 
straightforward code written in Matlab with minimal optimization took about three minutes to run 
on a desktop PC. The pde solver utilized a basic explicit scheme and used a 400 x 400 grid with 
8000 timesteps. In order to allow a fair comparison, the pde solver also allowed only T/At = 20 
exercise opportunities by enforcing the barrier condition V{t, m,p, y) > G{m,p, y) only for t = mAt, 
m = 0, 1, . . . 20. In financial lingo, we thus studied the Bermudan variant of (41) with At = 0.05. 

The obtained results are summarized in Table 1 for a variety of initial conditions (^^O)^)- Using 
the pde solver as a proxy for the true answer, we find that our algorithm was generally within 2% of 
the correct value which is acceptable performance. Interestingly, our algorithm performed worst for 
"in-the-money" options (such as when Yq = 1.8, ~ AA(0.2, 0.05^)), i.e. when it is optimal to stop 
early. As expected, our method produced an underestimate of true V since the computed stopping 
rule is necessarily sub-optimal. We found that the distribution of the computed r* was quite uniform 
on {At, 2At, . . . , (M — l)At} showing that this was a nontrivial stopping problem. For comparison. 
Table 1 also lists the European option price assuming that early exercise is no longer possible. This 
column shows that our algorithm captured about 85-90% of the time value of money, i.e. the extra 
benefit due to early stopping. 
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-0.3 -0.2 -0.1 01 02 0.3 



Figure 1. Comparison of optimal stopping regions for the pde and Monte Carlo 
solvers. The solid line shows the optimal stopping boundary as a function of mt; the 
color-coded points show the values z'j{t), j = 1,. . .,N projected onto E[X(|J^^^] = 
{ptx) ■ iptl)-^. Here Xq ~ Ar(0, O.OS^), Yq = 2, N = 30 000 and t = 0.5. 



To further illustrate the structure of the solution, Figure 1 compares the optimal stopping regions 
computed by each algorithm at a fixed time point t = 0.5. Note that since the value function 
V is typically not very sensitive to the choice of a stopping rule, direct comparison of optimal 
stopping regions is more relevant (and more important for a practicing controller). As we can see, 
an excellent fit was obtained through our non-parametric method. Figure 1 also reveals that both 
{qt > Gt} r\ {qt < Gt} and {qt < Gt} H {qt > Gt} were non-empty (in other words, sometimes our 
algorithm stopped too early; sometimes it stopped too late). Recall that the simulation solver works 
under P and therefore the empirical distribution of Yt in Figure 1 would be different from the actual 
realizations under P that will be observed by the controller. 

While the pde formulation (42)-(44) is certainly better for the basic example above, it is crucially 
limited in its applicability. For instance, (42) assumes Gaussian initial condition; any other 
renders it invalid. Similarly, perturbations to the dynamics (40) will at the very least require re- 
derivation of (42)-(44), or more typically lead to the case where no finite-dimensional sufficient 
statistics of XtlJ-f" exist. In stark contrast to such difficulties, the particle filter algorithm can be 
used without any modifications for any (^q, and would need only minor adjustments to accommodate 
a different version of (40). A simple illustration is shown in the last two rows of Table 1 where 
we consider a uniform and a discrete initial distribution, respectively. Heuristically, V should be 
increasing with respect to the kurtosis of as a more spread-out initial distribution of Xt leads 
to more optionality. Hence, (as confirmed by Table 1), V{0, ,yo) < V{0,^'^,yo) < V{0,^'^,yo), 
where = Ar(0, 0.05^), ^ Unif or m{[-0. 05 Vs, 0.05 V3]),£,^ = 0.5((5„o.05 + -^o.os) are three initial 
distributions of X normalized to x^^{dx) = 0, x'^^^{dx) = 0.05^. 
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yo 


Simulation solver 


pde solver 


European option 


AA(0, 0.052) 


2 


0.1810 


0.1853 


0.1331 


7\A(-0.12,0.052) 


2.24 


0.2566 


0.2661 


0.2136 


AA(0.2,0.052) 


1.8 


0.1862 


0.1904 


0.1052 


AA(0,0.l2) 


2 


0.1852 


0.1919 


0.1349 


So 


2 


0.1723 


0.1832 


0.1325 


Unif [_o.05\/3,0.05v^] 


2 


0.1827 




0.1347 


0.5((5_o.05 + <5o.05) 


2 


0.1853 




0.1332 



Table 1 . Comparison of the Monte Carlo scheme of Section 3.3 versus the Bermudan 
pde solver for the stochastic drift example of Section 5. Standard error of the Monte 
Carlo solver was about 0.001. 



6. American Option Pricing under Stochastic Volatility 

Our method can also be applied to stochastic volatility models. Such asset pricing models are 
widely used in financial mathematics to represent stock dynamics and assume that the local volatility 
of the underlying stock is itself stochastic. While under continuous observations the local volatility 
is perfectly known through the quadratic variation process, under discrete observations this leads to 
a partially observed model similar to (3). 

To be concrete, let Yt represent the log-price of a stock at time t under the given (pricing) measure 
P, and let Xt be the instantaneous volatility of Y at time t. We postulate that {X, Y) satisfy the 
following system of sde's (known as the Stein-Stein model), 

f dYt = {r-\x^)dt + XtdUt, 

(45) < ^ 

[ dXt = K{a - Xt) dt + pa dUt + \/\ - p^a dWt- 

The stock price Y is only observed at the discrete time instances T = {At,2Ai, . . .} with = 
a{Yo, Ya*! ■ ■ ■ , Y^t/j\t\At)- The American (Put) option pricing problem consists in finding the optimal 
^^-adapted and T-valued stopping time r for 

(46) supE[e-'^^(if -e^-)+]. 

A variant of (45)-(46) was recently studied by Sellami et al. [36]. More precisely, [36] considered 
the American option pricing model in a simplified discrete setting where (Xt) of the Stein-Stein 
model (45) was replaced with a corresponding 3-state Markov chain approximation. In a related 
vein, Viens et al. [41] considered the filtering and portfolio optimization problem where the second 
line of (45) was replaced with the Heston model 



(47) 



d{X^) = K{a - X^) dt + paXtdUt + y^l^aXtdWt- 
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In general, the problem of estimation of Xt is well-known, see e.g. [10, 14, 39] . Observe that while (45) 
is linear, the square-root dynamics in (47) are highly non-linear and no finite-dimensional sufficient 
statistics exist for vrf in the latter case. 

In the presence of stochastic volatility, one may no longer use the reference probability measure 
P. Indeed, there is no way to obtain a Brownian motion from the observation process Y whose 
increments are now tied with the values of the unobserved X. Accordingly, C, is no longer defined 
and consequently we cannot use it as an importance weight during the particle branching step in 
(24). 

A way out of this difficulty is provided by Del Moral et al. [12]. The idea is to propagate particles 
independently of observations and to compute a candidate observation for each propagated particle. 
The weights are then assigned by comparing the candidates with the actual observation. Let (f) be 
a smooth bounded function with J^cji^x) dx = 1 and \x\4>{x) dx < oo (e.g. (j){x) = exp(— x^/2) • 
(27r)^^/^). The propagated particles and candidates are obtained by 

{s))ds+ / padU^^^ + / ^/l 

J mAt J mAt J mAt 

Y^^\t) = f (r - \{v]{t)f) dt + f v^'is) dUP, mAt < t < {m + l)At, 

JmAt 2 JmAt 

where {U^^\w'-^^)1^ I are n independent copies of bivariate Wiener processes. The branching weights 
are then given by 

(48) d]{{m + l)At) = nV3^ {n'/Hyl^_^i)At " V+i)At)) • 

(7) 

Hence, particles whose candidates ^(^^.^^^^ are close to the true observed 5^(m+i)At g^t high weights, 
while those particles that produced poor candidates are likely to be killed off. The rest of the 
algorithm remains the same as in Section 3.1. As shown in [12, Theorem 5.1], the resulting filter 
satisfies for any bounded payoff function / G C^{W^) 

(49) E[Kr/-7r</|]<^||/||o,oo, with C(t) = 0(e*). 

Note that compared to (28), the error in (49) as a function of number of particles n is worse. This 
is due to the higher re-sampling variance produced by the additional randomness in y^-^^'s. 

6.1. Numerical Example. Recently [36] considered the above model (45) with the parameter val- 
ues 



Parameter 



Yq Xq K k a a T r p 



Value 110 0.15 100 1 0.15 0.1 1 0.05 
Plugging-in the above parameters and using the modification (48), we implemented our algorithm 
with N = 30, 000, n = 1000. Since no other solver of (46) is available, as in [36] we compare the 
Monte Carlo solver of the partially-observed problem to a pde solver for the fully observed case (in 
which case the Bermudan option price is easily computed using the quasi-variational formulation 
based directly on (45)). Table 2 shows the results as we vary the observation frequency At. Since 
At is also the frequency of the stopping decisions, smaller At increases both the partially and fully 
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observed value functions. Moreover, as At gets smaller, the information set becomes richer and the 
handicap of partial information vanishes. 

In this example where the payoff K — exp(Yt) is a function of the observable Y only, our algorithm 
obtains excellent performance. Also, we see that partial information has apparently only a mild 
effect on potential earnings (difference of less than 1.5% even if Y is observed just five times). To 
give an idea of the corresponding time value of money, the European option price in this example 
was 1.570. Comparison with the results obtained in [36] (first two columns of Table 2) is complicated 
because the latter paper immediately discretizes X and constructs a three-state Markov chain (Xt). 
This discrete version takes on the values X^At S {0.1,0.15,0.2} and therefore does not exhibit the 
asymmetric behavior of very small Xt realizations that dampen the volatility of Y and drastically 
reduce Put profits. In contrast, our algorithm operates on the original continuous-state formulation 
in (45). Consequently, as can be seen in Table 2, the full observation prices of the two models are 
quite different. 





Discrete Model 


Continuous Model 


At 


Fuh Obs. 


Partial Obs. 


Fuh Obs. 


Partial Obs. 


0.2 


1.575 


0.988 


1.665 


1.646 


0.1 


1.726 


1.306 


1.686 


1.673 


0.05 


1.912 


1.596 


1.696 


1.685 



Table 2. Comparison of discrete and continuous models for (45) under full and 
partial observations. The first two columns are reproduced from [36, Table 3]. 



7. Conclusion 

In this paper we have presented a new numerical scheme to solve partially observable optimal 
stopping problems. Our method is entirely simulation-based and only requires the ability to simulate 
the state processes. Consequently, we believe it is more robust than other proposals in the existing 
literature. 

While our analysis was stated in the most simple setting of multi-dimensional diffusions, it can be 
considerably extended. First, as explained in Section 6, our algorithm can be easily adjusted to take 
into account discrete observations which is often the more realistic setup. Second, the assumption of 
diffusion state processes is not necessary from a numerical point of view; one may consider other cases 
such as models with jumps, or even discrete-time formulations given in terms of general transition 
semigroups. For an example using a particle filter to filter a stable Levy process X, see [22]. Third, 
one may straightforwardly incorporate state constraints on the unobserved factor X. For instance, 
some applications imply that > is an extra constraint on top of (1) (in other words the observable 
filtration is generated by Y and l{Xt>o})- Such a restriction can be added by assigning zero weights 
to particles that violate state constraints so that they are not propagated during the next branching 
step. Finally, if one uses the modification (48) from [12] then many other noise formulations can be 
chosen beyond (2). 
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